Mixing time of exponential random graphs 



Shankar Bhamidi* Guy Bresler^ Allan Sly* 



Abstract 

A variety of random graph models have been developed in recent years to study a range 
of problems on networks, driven by the wide availability of data from many social, telecom- 
munication, biochemical and other networks. A key model, extensively used in the sociology 
literature, is the exponential random graph model. This model seeks to incorporate in ran- 
dom graphs the notion of reciprocity, that is, the larger than expected number of triangles 
and other small subgraphs. Sampling from these distributions is crucial for parameter estima- 
tion hypothesis testing, and more generally for understanding basic features of the network 
model itself. In practice sampling is typically carried out using Markov chain Monte Carlo, 
in particular either the Glauber dynamics or the Metropolis-Hasting procedure. 

In this paper we characterize the high and low temperature regimes of the exponential 
random graph model. We establish that in the high temperature regime the mixing time 
of the Glauber dynamics is 0(n 2 logn), where n is the number of vertices in the graph; in 
contrast, we show that in the low temperature regime the mixing is exponentially slow for any 
local Markov chain. Our results, moreover, give a rigorous basis for criticisms made of such 
models. In the high temperature regime, where sampling with MCMC is possible, we show 
that any finite collection of edges are asymptotically independent; thus, the model does not 
possess the desired reciprocity property, and is not appreciably different from the Erdos-Renyi 
random graph. 
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1 Introduction 

In the recent past there has been explosion in the study of real-world networks including rail 
and road networks, biochemical networks, data communication networks such as the Internet, 
and social networks. This has resulted in a concerted interdisciplinary effort to develop new 
mathematical network models to explain characteristics of observed real world networks, such 
as power law degree behavior, small world properties, and a high degree of clustering (see for 
example [1] [8] and the citations therein). 
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Clustering (or reciprocity) refers to the prevalence of triangles in a graph. This phenomenon 
is most easily motivated in social networks, where nodes represent people and edges represent 
relationship. The basic idea is that if two individuals share a common friend, then they are more 
likely than otherwise to themselves be friends. However, most of the popular modern network 
models, such as the preferential attachment and the configuration models, are essentially tree-like 
and thus do not model the reciprocity observed in real social networks. 

One network model that attempts to incorporate reciprocity is the exponential random graph 
model. This model is especially popular in the sociology community. The model follows the 
statistical mechanics approach of defining a Hamiltonian to weight the probability measure on 
the space of graphs, assigning higher mass to graphs with "desirable" properties. While deferring 
the general definition of the model to Section let us give a brief example. Fix parametric 
constants h, (3 > and for every graph X on n labeled vertices with E{X) edges and T{X) 
triangles, define the Hamiltonian of the graph as 

H{X) = hE{X) + (3T{X) . 

A probability measure on the space of graphs may then be defined as 

e H{X) 

Pn{X) = , (1) 

where Z is the normalizing constant often called the partition function. More generally, one can 
consider Hamiltonians in graphs which include counts Ti{X) of different small subgraphs Gi, 

H{x) = Y J m(x). 

i 

Social scientists use these models in several ways. The class of distributions ([TJ is an exponential 
family, which allows for statistical inference of the parameters using the subgraph counts (which 
are sufficient statistics for the parameters involved). Sociologists carry out tests of significance, 
hoping to understand how prescription of local quantities such as the typical number of small 
subgraphs in the network affects more global macroscopic properties. Parameter estimation can 
be carried out either by maximum likelihood or, as is more commonly done, by simply equating 
the subgraph counts. Both procedures in general require sampling, in the case of maximum 
likelihood to estimate the normalizing constants. Thus, efficient sampling techniques are key to 
statistical inference on such models. At a more fundamental level, sociologists are interested in 
the the question of how localized phenomena involving a small number people determine the large 
scale structure of the networks [16]. Sampling exponential random graphs and observing their 
large scale properties is one way this can be realized. Sampling is almost always carried out using 
local MCMC algorithms, in particular the Glauber dynamics or Metropolis-Hasting. These are 
reversible ergodic Markov chains, which eventually converge to the stationary distribution p n {X). 
However, our results show that the time to convergence can vary enormously depending on the 
choice of parameters. 

Our results: It is surprising that in spite of the practical importance of sampling from exponen- 
tial random graph distributions, there has been no mathematically rigorous study of the mixing 
time of any of the various Markov chain algorithms in this context. The goal of this paper is 
to fill this gap. We focus attention to the Glauber dynamics, one of the most popular Markov 
chains. We provide the first rigorous analysis of the mixing time of the Glauber dynamics for the 
above stationary distribution and do so in a very general setup. In the process we give a rigorous 
definition of the "high temperature" phase, where the Gibbs distribution is unimodal and the 
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Glauber dynamics converges quickly to the stationary distribution, and the "low temperature" 
phase, where the Gibbs distribution is multimodal and the Glauber dynamics takes an expo- 
nentially long time to converge to the stationary distribution. While a complete understanding 
of the Gibbs distribution in the low temperature phase remains out of reach (see, however, the 
important work of Sourav Chatterjee in the case of triangles [6]), we can nevertheless show that 
the distribution has poor conductance, thereby establishing exponentially slow mixing for any 
local Markov chain with the specified stationary distribution. 

Our results, moreover, give a rigorous basis for criticisms made of such models. In the high 
temperature regime, where sampling with MCMC is possible, we show that any finite collection 
of edges are asymptotically independent. Also, we show that with exponentially high probability a 
sampled graph is weakly pseudorandom, meaning that it satisfies a number of equivalent properties 
(such as high edge expansion) shared by Erdos-Renyi random graphs. Thus, the model does not 
possess the desired reciprocity property, and is not appreciably different from the Erdos-Renyi 
random graph. 

Relevant literature: There is a large body of literature, especially in the social networking 
community, on exponential random graph models. We shall briefly mention just some of the 
relevant literature and how it relates to our results (see |14(, [T6l [3] and the references therein for 
more background). The pioneering article in this area by Frank and Strauss [TT] introduced the 
concept of Markov graphs. Markov graphs are a special case of exponential random graphs with 
only situation where the subgraphs are stars or triangles. Extending the methodology of 
Wasserman and Pattison jT7] introduced general subgraph counts. However, from the outset a 
number of researchers noted problems at the empirical level for their Markov chain algorithms, 
depending on parameter values. See [16] for a relevant discussion of empirical findings as well as 
several new specifications of the model to circumvent such issues. 

On the theoretical side, Sourav Chatterjee [6], in his recent work characterizing the large deviation 
properties of Erdos-Renyi random graphs, developed mathematical techniques that can be used to 
study the distribution these random graphs. At the statistical physics (non-rigorous) level Mark 
Newman and his co-authors have studied the case where the subgraphs are triangles and 2-stars. 
In this setting, using mean-field approximations, they predicted a phase transition between a 
high-symmety phase, with graphs exhibiting only a mild amount of reciprocity, and a degenerate 
symmetry-broken phase with either high or low edge density (see [TH] and |12|). 

1.1 Definitions and Notation 

This section contains a precise mathematical definition of the model and the Markov chain 
methodology used in this paper. We work on the space Q n of all graphs on n vertices with 
vertex set [n] := {1,2,..., n}. We shall use X = (x e ) to denote a graph from Q n where for every 
edge e = (i, j), x e is 1 if the edge between vertex i and j is present and otherwise. For simplicity, 
we shall often write X{e) for x e . The exponential random graph model is defined in terms of the 
number of subgraphs G (e.g., triangles or edges) contained in X . It will be convenient to define 
these subgraph counts as follows. Fix a graph G on the vertex set 1, 2, ... m. Let [n] m denote the 
set of all m tuples of distinct elements: 

[n] m ■= {(vi, . . . ,v m ) ■ Vi e [n],vi / v 2 ■ ■ ■ / . . . ,v m } . 

We shall denote such an m tuple of distinct vertices by v m . In a graph X, for any m distinct 
vertices v m , let Hx(v m ) denote the subgraph of X induced by v m . Say that Hx(v m ) contains 
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G, denoted by Hx(y m ) = G, if whenever the edge is present in G, then the edge (vi, Vj) is 
present in i?x(v m ) for all {1 < i ^ j < m}. For a configuration X G Q n and a fixed graph G 
define the count 

N G (X)= HHx(v m ) = G}. (2) 

v m G[n] m 

This definition is equivalent to the usual exponential random graph model up to adjustments 
in the constants (3 by multiplicative factors. It counts subgraphs multiple times; for instance a 
triangle will be counted 6 times and in general a graph G with k automorphisms will be counted 
k times. By dividing the parameters by this multiplicative factor we reduce to the usual 
definition. 

In our proof we shall also need more advanced versions of the above counts which we define now. 
Fix an edge e = (a, b) G X. The subgraph count of G in X U {e} containing edge e is defined as: 

N G (X, e)= l{#xu{e}(v m ) = G} . 

v m G[n] m ,v m 9a,b 

Similarly, for two edges e = (a, b) and e' = (c, d) define the subgraph counts of G in X U {e, e'} 
and containing edges e,e' by 

N G (X, e, e') = l{#xu{e,e'}(v m ) = G} . 

v m G[n] m ,v m 3a,b,c,d 

Gibbs measure: We now define the probability measure on the space Q n . Fix k > 1 and fix 
graphs Gi, G2 ■ ■ ■ , G s with G. L a graph on |T^| labelled vertices, with \ V; L \ < L and with edge set 
E{. For simplicity we shall think of G{ as a graph on the vertex set 1, 2, . . . |Vj|. By convention we 
shall always let G\ denote the edge graph consisting of the graph with vertex set 1 , 2 and edge 
set (1,2). In this notation, for any configuration X 6 Q n the quantity N Gl (X) will be twice the 
number of edges in X. With this convention, fix constants (3±, fo, ■ ■ ■ Ps with j3i > for i > 2 and 
[3\ G R. The exponential random graph probability measure is defined as follows. 



Definition 1 For G±, . . . G s and constants (3 = (Pi, . . . , /? s ) as above, the Gibbs measure on the 
space Q n is defined as the probability measure 



Pn(X) = ^T^e^\}J t -^ XeG n . (3) 



Here Z n (P) is the normalizing factor and is often called the partition function. For simplicity we 
have suppressed the dependence of the measure on the vector /3. Also, note the normalization 
of the subgraph count of Gi by the factor n'^' -2 , so that the contribution of each factor scales 
properly and is of order n 2 in the large n limit. Setting > for i > 2 makes the Gibbs measure 
a monotone (also ferromagnetic) system which will be important for our proof. The term ft\ does 
not affect the interaction between edges and plays the role of an external field in this model; 
adjusting fi\ makes it more or less likely for edges to be included. 

The term in the exponent is often called the Hamiltonian and we shall denote it by: 
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Note that H(X) : {0,1}© -» K+ is a function of Boolean variables -X"(e) and has an 
elementary Fourier decomposition in terms of the basis functions J3 egS X(e), where S runs over 
all possible subsets of edges. Thus, with respect to any fixed edge e, we can decompose the above 
Hamiltonian as 

H(X) = A e (X) + B e (X) , 

where A e consists of all terms dependent on edge e and B e {X) denotes all terms independent 
of edge e. Let X e+ denote the configuration of edges which coincides with X for edges e / / 
and has X e +(e) = 1. The partial derivative with respect to the edge e of the Hamiltonian H, 
evaluated at a configuration X, is defined by the formula 

d e H{X) = A e {X e+ ) . 

The higher derivatives d e d e i for e/e' are defined similarly by iterating the above definition. 

Glauber dynamics and local chains: The Glauber dynamics is an ergodic reversible Markov 
chain with stationary distribution p n (-), where at each stage exactly one edge is updated. It is 
defined as follows: 

Definition 2 Given the Gibbs measure above, the corresponding Glauber dynamics is a discrete 
time ergodic Markov chain on Qw Given the current state X 7 the next state X 1 is obtained 
by choosing an edge e uniformly at random and letting X' = X e+ with probability proportional 
p n {X e+ ) and X'{e) = X e _ with probability proportional to p n (X e J). Here X e+ is the graph which 
coincides with X for all edges other than e and X e+ {e) = 1. Similarly X e ^ is the graph which 
coincides with X for all edges other than e and X e _{e) = 0. 

There are various other chains that can also be used to simulate the above Gibbs measure. Call 
a chain on Q n local if at most o(n) edges are updated in each step. The transition rates for the 
Glauber dynamics satisfy the following relation: 

Lemma 3 Given that we chose edge e to update, the probability of the transition X <^-> X e+ is 
i+eJp(d!H(x)) and the Probability of the transition X X e _ is 1+cxp( g eH(x)) 

Mixing time: We will be interested in the time it takes for the Glauber dynamics to get close to 
the stationary distribution given by the Gibbs measure ©. The mixing time r m i x of a Markov 
chain is defined as the number of steps needed in order to guarantee that the chain, starting from 
an arbitrary state, is within total variation distance e" 1 from the stationary distribution. 

We mention the following fundamental result which draws a connection between total variation 
distance and coupling. It allows us to conclude that if we can couple two versions of the Markov 
chains started from different states quickly, the chain mixes quickly. The following lemma is well 
known, see e.g. [2]. 

Lemma 4 (Mixing time Lemma) For a Markov chain X, suppose there exist two coupled 
copies, Y and Z, such that each is marginally distributed as X and 

max[P(y t0 ^ Z t0 \Y = y,Z = z)< (2e)-\ 

y,z 

Then the the mixing time of X satisfies T m %x < to. 
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Because the exponential random graph model is a monotone system, we can couple the Glauber 
dynamics so that if X(0) < Y(0), then for all t, X(t) < Y(t). This inequality is a partial ordering 
meaning that the edge set of X is a subset of the edge set of Y. This is known as the monotone 
coupling and, by monotonicty, Lemma [4] reduces to bounding the time until chains starting from 
the empty and complete graphs couple. 

With the above definitions of the Gibbs measure, the following functions determine the properties 
of the mixing time. Define for fixed (3 G M x the functions 

s 

i=i 



1 + exp(*(p)) 



Note that &p is a smooth, strictly increasing function on the unit interval. Since <pp(0) > and 
fp(l) < 1 the equation fpip) = p has at least one solution, denoted by p*. If this solution is 
unique and not an inflection point, then < ip^p*) < 1. The function ip(p) has the following 
loose motivation: if X is a graph chosen according to the Erdos-Renyi distribution G(n,p), then 
with high probability all edge update probabilities i+ex^^^r ) ) are approximately (p{p). 

Phase identification: We now describe the high and low temperature phases of this model. 
Recall that our parameter space is B = IR x (M + ) s_1 . We call p G [0, 1] a fixed point if <pp(p) = p. 

High temperature phase: We say that a (5 £ B belongs to the high temperature phase if (fp(p) = p 
has a unique fixed point p* which satisfies 

tp'pip*) < 1. (4) 



Low temperature phase: We say that a S B belongs to the low temperature phase if (fpip) = p 
has at least two fixed points p* which satisfy (fp(p*) < 1. 

Values of f3 not in either phase are said to be in the critical points. They occur when one of the 
fixed points is an inflection point of if p. These critical points form an s — 1 dimensional manifold 
which is in the intersection of the closure of the high and low temperature phases. For simplicity, 
in the proof we shall suppress the dependence of the functions on (3 and write ip for ipp and ^ for 



1.2 Results 



The first two results show that the high and low temperature phases determine the mixing time 
for local Markov chains. 

Theorem 5 (High temperature) If (p(p) is in the high temperature regime then the mixing 
time of the Glauber dynamics is 0(n 2 logra). 



Theorem 6 (Low temperature) If <p{p) is in the low temperature regime then the mixing time 
of the Glauber dynamics is e^ n ^ . Furthermore, this holds not only for the Glauber dynamics but 
for any local dynamics on Q n . 
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The next theorem shows that the exponential random graph model is not appreciably different 
from Erdos-Renyi random graph model in the high temperature regime where sampling is possible. 

Theorem 7 (Asymptotic independence of edges) LetX be drawn from the exponential ran- 
dom graph distribution in the high temperature phase. Let e\, . . . , e& be an arbitrary collection of 
edges with associated indicator random variables x ei = l(ej S X). Then for any e > 0, there is 
an n such that for all (ai, . . . , a k ) £ {0, l} k the random variables x ei , . . . , x ek satisfy 

P(xi = oi, . . . ,x k = a k ) - {p*)^(l - p*) k ~^ 

Thus, the random variables x ei , . , . , x ek are asymptotically independent. 

A consequence is that a graph sampled from the exponential random graph distribution is with 
high probability weakly pseudo-random (see [13J or |7 ). This means that it satisfies a number of 
equivalent properties, including large spectral gap and correct number of subgraph counts, that 
make it very similar to an Erdos-Renyi random graph. 

Corollary 8 (Weak pseudo-randomness) With probability 1 — o(l) an exponential random 
graph is weakly pseudo-random. 



< 



n 



\v\ ■ 



1.3 Idea of the proof 

We give a summary of the main ideas of the proof: 

• Consider first the high temperature phase. A natural approach to bounding the coupling 
time, and hence the mixing time by Lemma |4j is to use the technique of path coupling [5]. 
In path coupling, instead of trying to couple from every pair of states, we try to show that 
for any pair of states x and y that differ in a single edge there exists a coupling of two copies 
of the chain started at x and y such that 

E(d H (X(l), Y(1))\X(0) = x, Y(0) =y)<{l-0) (5) 

for some (3 = /3(n), where dn is the Hamming distance. However, this approach fails for 
some ipp in the high temperature regime when sup 0<p<1 ip'(p) > 1. 

• It turns out that the configurations in the high temperature regime where path coupling 
fails are very rare under the Gibbs measure. We therefore define a set (a neighborhood of 
the unique fixed point (p'g(-) < 1), in which path coupling does give a contraction. More 
precisely, for a configuration X, define 

This is (asymptotically) the maximum likelihood choice for the parameter p of the Erdos- 
Renyi random graph on n vertices, G(n,p), having observed Nc(X,e) subgraphs G con- 
taining the edge e. Let {G\} denote the class of all graphs with at most L vertices, where 
L is some integer greater than or equal to maxj \ Vi\. What we prove is that for e small 
enough, if the two configurations x and y belong to the set 

G := {X : max \r G (X, e) - p*\ < e} 

GeG A 
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then equation [5] holds for f3(n) = —5/n 2 for some 5 > 0. Thus, starting from any state x, if 
we can show that in a small number of steps (0(ra 2 ) is enough) we reach G, then a variant of 
path coupling proves rapid mixing. This preliminary stage where we run the Markov chain 
for some steps so that it reaches a "good configuration" is termed the burn in phase. This 
approach has been used before, particularly in proving mixing times for random colourings, 
for example in [9]. 

• To show that we enter the good set G quickly, we control all the rg(X, e), for all subgraphs 
G £ G\ simultaneously, and via a coupling with biased random walks show that with 
exponentially high probability for large n, within 0(n 2 ) steps we reach the set G. We 
crucially make use of the monotonicity of the system here by writing the drifts in terms 
of the rc(X,e) and bounding them by their maximum. This completes the proof for the 
rapid mixing in the high temperature phase. This also shows how in the high temperature 
phase, most of the Gibbs measure of the exponential random graph model is concentrated 
on configurations which are essentially indistinguishable from the Erdos-Renyi G(n,p*) 
random graph model. 

• In the low temperature phase we use a conductance argument to show slow mixing for any 
Markov chain that updates o(n) edges per time step. The argument makes use of the same 
random walk argument used in the burn in stage to bound the measure of certain sets of 
configurations under the Gibbs measure. Specifically, we show that for every fixed point 
p* of the equation <p(p) = p with ip'(p) < 1, the Glauber dynamics allows an exponentially 
small flow of probability to leave the set of configurations that are nearly indistinguishable 
from an Erdos-Renyi random graph with parameter p*. Because the stationary distribution 
of the Glauber dynamics is the Gibbs measure, this allows us to bound the relative measure 
of the sets under consideration, thereby showing that if we have two or more fixed points p*, 
then it takes an exponentially long time for configurations to leave the set of configurations 
indistinguishable from an Erdos-Renyf random graph with parameter p* . Thus mixing takes 
an exponentially long time. 



2 Proof of the main results 
2.1 Subgraph counts 

Before starting the proof we need a couple of simple lemmas on the subgraph counts. For a graph 
X £ Q n recall the subgraph counts Nq(X) of a predefined graph G on m nodes as well as the 
counts in X of the subgraphs containing edges, namely Nq(X, e) and Nq(X, e, e') as defined in 
Section 11.11 

The following lemma records the quantities Nq(X), Nq(X, e), and Nq(X, e, e') for the complete 
graph X = K n . 

Lemma 9 Consider the complete graph on n vertices, K n , and let NG(K n ), Nc(K n , e), and 
NG(K n ,e,e') be defined as above. Then 



(a) 
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(b) 

N G (K n , e) = 2\E\ (|^~_V) (\ v \ " 2 ) ! ~ 2 \ E \ ' n ^~ 2 
(c) For a fixed edge e we have 

J2N G (K n ,e,e') = (\E\ - l)N G (K n ,e) ~ 2|£7|(|.E7| - l)^" 2 



Lemma 10 For an edge a in the graph G, denote by G a the graph obtained from G by removing 
the edge a. Then 

Y,N G (X,e,e')= N G a (X,e). (7) 

Proof. The sum on the left-hand side of [7] counts the total number of isomorphic embeddings of 
G that contain the edge e in the configuration XU{e'}, for some e' with the edge e marked. Now, 
each isomorphism with marked edge e' is counted on the right-hand side of [7] for the choice a equal 
to the marked edge in the graph G, with the same isomorphism restricted to G a . Conversely, for 
each a £ E{G) and each subgraph embedding, the same embedding is counted on the left-hand 
side with the edge e' situated at the location a. 



2.2 Burn-in period 

In this section we show that after a suitably short "burn in" period, the Markov chain is in the 
good set G. Let r max (X) = max ei A r Gx (X, e). The following lemma bounds the expected drift of 

Lemma 11 The expected change in N G (X,e) after one step of the Glauber dynamics, starting 
from the configuration X, can be bounded as 



E 



N G (X(l),e)-N G (X(0),e) 



7? 



\V\-2 



< (l +0 (l))^|S|(| J E|-l)[-r(G, e )l £ l- 1 + ^r ma3; )(r ma;i; )l s l- 2 ]. 

\2J 



Proof. For ease of notation, we suppress the dependence of r max on the configuration X. The 
expected change, after one step of the Glauber dynamics, in the number of isomorphisms from 
G to subgraphs of X containing the edge e can be counted by first negating the expected loss 
in number when removing a random edge e' (leaving the configuration unchanged if e' was not 
present), and then adding the expected number of graphs created by including a random edge e'. 
This gives 



E 



N G (X(l),e)-N G (X(0),e 

n \V\-2 

-1 / \ -1 



1 



\V\-2 



n 



) (\E\ - 1)N G (X, e) + Q Y Nq ( X > e ' e ')P(^e'(l) = l|e' updated) 



(8) 
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Now, we may upper bound the probability of including an edge using Lemma [3] and the definition 

of ''max- 

P(X e ,(l) = l|e' updated) - ^d e H(X)) 



exj>{d e H{X)) + 1 



N Gl {X,e) 
\v\-2 



exp(E i ft%ff) + l 



„ vn |V / o. A r G(^n,e)(r max )l B »l- 1 
ex P \ L^i Pi n |V|-: 

N G (K n ,e)(r m ^)\ E i\ 



. , , | V | - 2 

< (1 + 0(1)) 



= (l + o(l))^(r max ). 
Next, by Lemmas [9| and [To] and the definition of r max , we have 
^iY G (X,e,e') = ^iV Ga (X,e) 

e' « 

<^iV Ga (^ n ,e)(r max )l s l- 2 
= ^iV G (^ n , e , e / )(r max )l £; l- 2 

e'/e 

= - l)2nl v 'l- 2 (r max )l i; l- 2 (l + o(l)) . 

Using the estimates (|9]) and (10), equation ([8]) gives 



E 



iV G (X(l),e)-iV G (X(0),e) 



n \V\-2 

< (1 + f-(l^l - 1)JVg(*, e) + v?(r max )2|£|(|£| - l)n^- 2 (r max )l E l- 2 



(9) 



(10) 



= (1 + o(l)) w|y| ^ a(B) [-(|^| - l)2|£|n^- 2 r(G, e)^" 1 + ^(r max )2|£|(|£| - l)n^- 2 (r max )l E l- 2 
= (1 + o(l)) ^ |£|(|£| - l)[-r(G, e)^" 1 + ^(r max )(r max )l s l- 2 ] . 

Lemma 12 Lei p* 6e a solution of the equation (p(p) = p with (f'{p*) < X, and let p be the least 
solution greater than p* of the equation <p(p) = p if such a solution exists or 1 otherwise. Let the 
initial configuration be -X"(0), with p* + fj, < r max (X(0)) < p — fi for some fj, > 0. Then there is a 
S,c > 0, depending only on fx, L and ip, so that after T = cn 2 steps of the Glauber dynamics, it 
holds that r max {X{T)) < r max (X(0)) — 5 with probability 1 — e~ n ( n \ 



Proof. The lemma is proved by coupling each of the random variables iV G (X(i), e) (one for 
each edge e and graph G), with an independent biased random walk. 



Choose e, 6 > so that for any r £ [p* + fi, p — \x — 5] , 

(r - 25)l £ l- 1 > <p(r + S)(r + 5)W~ 2 + e . 



(11) 



It follows by Lemma|TT]that if r G (X(t),e) > r max (X(0)) - 25 and r max (X(t)) < r max (X(0)) + 5, 

< -l/n 2 



then for sufficiently large n, 



E 



N G (X(t+X),e)-N G {X(t),e) 



n 



\V\-2 
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for some 7 > depending only on (p, 5, and e. Using this negative drift we bound the probability 
that any of the random variables r G (X(t), e) exceed r max (X(0)) + 5 before time T. 

Define the event 

MS) = f]{r G (X(t),e) < r max (X(0)) + 5} , 

e,G 

and put 

A(e, G, S) = A t n {r max (X(0)) - 25 < r G (X(t),e) < r max (X(0)) + 5} , 

and 

B tut2 {e, G,5)=i P| A n {r G (X(t 2 ),e) - r G (X(ti), e) > 5/2} . 

\tl<t«2 / 

B tljt2 (e, G, 6) is the event that all the edge statistics ro'(X(t), e') behave well starting at time t\ 
up to and including time £2 — 1, and the statistic rc(X(t),e) increases by at least 5/2 in the time 
period from £1 to £2. 

The event that some tg{X{t), e) exceeds r max (X(0)) +5 at some time r, 1 < r < T, is contained 
in the event Ue G ^Jo<ti<t 2 <T ^ti,*2 ( e ) G, 5). The next claim bounds the probability of the bad 
event for a particular choice of edge e and graph G and the proof of this lemma follows. 

Claim 13 The probability of the event Uo<ti<t 2 <T Bti,t 2 ( e i G, 5) is bounded as 

P| (J B tut2 (e,G,6)\<e- n W. (12) 

\0<£i<t 2 <T / 



Proof. 

For all X we have Nq^X, e, e') < NG t {K n , e, e'). The term Nc^Kn, e, e') is the number of graphs 
Gi in the complete graph containing both e and e'. In the case that the two edges e and e' share 
a vertex they define 3 vertices, which leaves at most \Vi\ — 3 remaining vertices to be chosen. It 
follows that N Gi (K n ,e,e') < 0(n^- 3 ) and so 

N Gi (X,e,e') =0(n- 1 ). (13) 

Note that an adjacent edge e' is only chosen with probability 0(n _1 ). When e and e' do not 
share an edge then 

N Gi (X,e,e') =0(n- 2 ). (14) 

Although the claim concerns the random variable r(G, e), we will work with the related random 
variable 

N G (X(t),e) 

The first step is to compute a bound on the moment generating function of 

St u t 2 = E ( Y t~ Y t-i + A)l(A-i(e,G,5)). 
t=ti+i 
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The random variable t a is the change in Y{ from time t\ to t<i while all the edge statistics are 
within the appropriate interval, shifted by 7^2 P er time step. Clearly we have the containment 



We have 



E 



E 



B tlit2 (e,G,6)Q{S tl ,t 2 >8/2} 



e ^ tl , t2 _ 1E / e e(y t2 -y t2 _ 1 +^)i(A_ l{ e,G,<5)) | ^ 2 _ i 



(15) 



Prom Lemma 11 and equation (11) it follows that E(Yf — Yt~\\{Dt-i{e, G, 5))\J-t~i) < — 7/n 2 . 
Recalling that with probability 1 — 0(n _1 ) it holds that \Yt — lt_i| = 0(n -2 ), and it always holds 
that \Yt — Yt-i\ = 0(ra _1 ), we have 

E ^ e e ( y *2-^ 2 -i+^2) 1 (^-i( e A<5))|_ Ft2 _ i 
d(Y t2 - K ta _! + ^ k 



fc=0 



2/1- 



k\ 



l(A-i(e,G,<y)) fc |^-i 



< l-l(A-i(e,G,<J))- n , 



76» 
2n2 



+ l(A-i(e,G,5))6> 2 E 



L fc=2 

= 1 - l(A-i(e,G,5)) (-^ + o(4 
Thus, when we take 6 = cn for sufficiently small c we have that 

E ^(yt 2 -r t2 - 1 +^)l(A-l(e,G,5))| JPt _ \ < 1 

and so 



E 





< E 


e 0S tl ,t 2 -i 









< 1 



where the second inequality follows by iterating the argument leading to the first inequality. We 
can choose a > depending only on L and <5 such that for any graph in {G\}, 



a< sup {{x + 5/2)\ E \~ l - (x)^" 1 }. 
ze[ P *,i] 



This gives the estimate 



and so 



J(Yt-Y ) 



-Sl(n) 



P(r G (X(t 2 ),e) - r G (X(ti),e) > 5/2) = e~ n H 



(16) 
(17) 



We may now apply (17) to equation (15), resulting in 



P I |J B tl;t2 (e,G,S) ] <T 2 e-^(l + (n)) = e- J? " j 

0<ti<t<t 2 <T 



which proves the claim. 



(18) 



Next, we argue that if all of the random variables rc(X(t), e) remain below r max + 5, then each 
random variable actually ends below r max — 5 with exponentially high probability. We prove this by 
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showing that each random walk actually reaches r max — 25, and then by the claim has exponentially 
small probability of increasing to r max — 5. Suppose that for some e, G, r G (X(0), e) > r max — 25. 
Then for T = cn 2 , 

P(r G (X(t),e) > r max - 25 for 1 < t < T) 

< P(r G (X(t), e) > r max - 25 for 1 < t < T, rWrA^)) + e~ n ^ 

< P(r G (X(t),e) > r max - 25 for 1 < t < T, ni< t < T A(e, G, 5)) + e~ n W 



f) + e-°W, 

where the last step follows since each of the T increments in Si t T contribute r y/2n 2 on the event 



< P(Si,t > -1 + y) + e 



f]i<t<TDt(e,G,5). Choosing c > 3/7 and using the estimate on the deviation of St lt t 2 (171 gives 
P(r G (X(t),e) > r max - 25 for 1 < t < T) < e~ n ^ . 

Finally, we have 

P(r G (X(T),e) >r max -5) 

<P(r G (X(T),e) >r max -J,r G (X(t),e) < r max - 2,5 for some t € [1, T]) + e ~ n W 
< P(Ui< tl < T S tl ,T(e J G, 5)) + e"^ < e^W . 

The union bound on probabilities applied over the set of edges e and graphs G completes the 
proof of Lemma [12] 



The following lemmas follow immediately from iterating Lemma 12 



Lemma 14 In the high temperature phase for any e > there is c > such that for any initial 
configuration X(0) = x, when t > cn 2 we have 

P(r max (X(t)) >p*+ e\X(0) = x)< e~ n{n \ 
P{r mm {X{t)) <p*- e\X(0) = x)< e^ n \ 

Lemma 15 In the low temperature phase suppose that p* is a solution to p = tp(p } and <p'(p*) < 1. 
There exists an e > such that if for some initial configuration X(0) we have that r max (X(0)) < 
p* + e and r m i n (X(0)) > p* — e then for some a > 



sup r max (X(t)) >p*+ 2e) < e~^ n \ 
:t<e an / 

P ( inf r max (X(t)) <p*- 2e) < e~ n ^ . 



2.3 Path coupling 

Lemma 16 Let p* E [0, 1] be a solution of the equation ip(p) = p and suppose < f'(p*) < 1. 
There exists e, 5 > sufficiently small and such that the following holds. Suppose that X + (0) > 
X~(0) are two configurations that differ at exactly one edge e. Suppose further that for all graphs 
G with at most L vertices and all edges e' 

\r(G,e') -p*\ < e. (19) 
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Then for sufficiently large n a single step of the Glauber dynamics can be coupled so that 

Ed H {X + {l),X-{\)) < l-5n~ 2 . 



Proof. 

We take the standard monotone coupling. Suppose that an edge e' 7^ e is chosen to be updated 
by the Markov chain. Then 



p{xi{\) = 1) 



eMde'H(X ± (0))) 



Since 



d e ,H(X ± (0)) = ^2 



1 +exp(de>H(X±(0)))' 
PiN Gi {X±(p),e>) 
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\Vi\-2 



by Lemma [9] and equation (19) we have that for large enough n, 



i=l 



(20) 



(21) 



Similarly 

< (1 - o(l))*(p* - e) < ^/i^X^O)) 
and so it follows that for any e' > that for large enough n and for small enough e we have that 



d e x 
dx 1 + e x 



d e ,H{X+{0)) 



d e x 

< (1 + 

~ v y dec 1 + 



(22) 



We now bound the sum of the d e d e /H(X + (0)) terms 

^N Gi {X^{Q),e,e') 



Y,ded e >H(X+(0)) = Y,Y, 



11 



\Vi\~2 



e'=^e i=l 

' ftE^)%,)J^(o)-e 

1=1 



< 



nl^l- 2 

^ ftE«(«,)(?' + e)l^l- 2 iV (Gi)Q (^ n , e ) 



it 



\Vi\-2 



£ 

8=1 

A v ^(p* + e)l^l- 2 JV G< (Jr rt ,e,e / ) 

i=l e'^e 



where the second and fourth lines follow from Lemma 10 and the inequality follows from Equation 
(19j). By Lemma [9] we have that 



2 d e d e ,H(X + (0)) < (1 + o(l)) ^21^1(1^1 - l)(p* + 6)l^l" 2 = (1 + o(l))*'(p* + e) . (23) 



i=l 
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By Taylor series for small h we have that 



e x + h & x 
l +e x+h ~ l + e x 



< 



d e x 
dx l+e x 



Equation (20), 



P(X+(1) = 1)-P(X-(1) = 1) 



d 



dx 1 + e 1 



d e ,H(X+(0)) 



(h + 0{h 2 )) and so using 
(d e d e ,H(X+(0)) + 0{(d e d e ,H(X + (V)) 2 )) 



< (1 + e')(l + o(l))d e d e ,H(X+(0)) 



d 



dx 1 + e 1 



*(p) 



(24) 
(25) 



using equations (22) and the fact that by equation (13) we have that d e d e 'H(X + (0)) = 0(n 1 ). 

Each edge e' has probability (™) 1 of being updated and if edge e is chosen to be updated then 
the number of disagreements is 0. It follows by equations (24) and (23) that for any e" > 



Ed H (X+(l),X-(l)) < 1 



< 1 



< 1 



1 - £(1 + 0(1 + o(l))d e d e ,H(X + (0)) 



d 



dx 1 + e x 



l-(l + e / )(l + o(l))*'(p* + e) 



d 



dx 1 + e x 



*(p) 



-i 



[i-(i + e '0(i + o (i)VV)] 



provided that e, e' are sufficiently small. The result follows, since (f'(p*) < 1. 
Proof of Theorem [5] 

We begin by proving the high temperature phase using a coupling argument. Let X + (t) and 
X~(0) be two copies of the Markov chain started from the complete and empty configurations, 
respectively, and coupled using the monotone coupling. Since this is a monotone system, it follows 
that if P(X + (t) X~(t)) < e , then t is an upper bound on the mixing time. The function ip 
satisfies the hypothesis of Lemma 16 so choose e and 5 according to the lemma. Let property At 
be the event that for all graphs G with at most L vertices and all edges e 

\r(G, e) — p*\ < e 



(26) 



for both X + (t) and X~(0). By Lemma 14 we have that if t > cn 2 , then P(A t ) > 1 — e" 



Since the subgraph counts Nq(X, e) are monotone in X, if X + (t) and X~(t) both satisfy equation 
(26), then there exists a sequence of configurations X~(t) = X° < X 1 < ... < X d = X + (t), 
where d = dH{X + (t), X~ (t)), each pair X' l ,X l+1 differ at exactly one edge, and each X % satisfies 
equation (26). Such a sequence is constructed by adding one edge at a time to X~(t) until X + (t) 
is reached. Applying path coupling to this sequence, we have that by Lemma [T6| 

E[d H {X + {t + l),X~{t + l))\X + {t),X-{t + l),A t ] < (l-5n- 2 )d H (X + (t),X-(t)). 

Since dn{X + (t), X~(t)) < (™), we have the inequality 

E [d H (X + (t + l),X~(t + 1))] < (1 - 5n~ 2 )E [d H (X+(t),X-(t))\A t ] P(A t ) + Q (1 - P(A)) 

< (1 -5n~ 2 )E [d H (X + (t),X-(t))] + ([ 
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Iterating this equation, we get that for t > C'n 2 , 

t 



j=C'n 2 



< ex V {-8n~ 2 {t - Cn 2 ))n 2 + e"^ Q^' 

Then for any e' > 0, when t > 2 -^n 2 logn we have that for large enough n, 

E[d H (X + (t),X-(t))] =o(l). 

It follows by Markov's inequality that P(X + (t) 7^ X~(t)) = o(l), which establishes that the 
mixing time is bounded by ^^n 2 logn. 

2.4 Slow mixing for local Markov chains in low-temperature regime 

We will use the following conductance result, which is taken from [10] (Claim 2.3): 

Claim 17 Let M be a Markov chain with state space O, transition matrix P, and stationary 
distribution n. Let A <Z £1 be a set of states such that ir(A) < \, and B C £1 be a set of states 
that form a "barrier" in the sense that Pij = whenever i G A \ B and j S A c \ B. Then the 
mixing time of M is at least ir(A)/8ir(B). 

Using this result we prove slow mixing for any local Markov chain. 
Proof of Theorem [6] 

Proof. Suppose p\ and P2 are solutions of the equation (p(p) = p with ip'(pi) < l,cp'{p2) < 1, 
and choose e > sufficiently small so that cp(p) < p for p £ (pi,Pi + 3e] and tp(p) > p for 
p £ \pi — 3e,pi), for i = 1,2. Let 

Ai = {X : r max (X) < p { + e and r min (X) > p { - e}, i = 1, 2 , 

and suppose the set A\ has smaller probability (switching the labels p\ and P2 if necessary), so 
7r(^4i) < 2- We note that for large enough n, ir(Ai) > since with high probability an Erdos- 
Renyf random graph G(n,Pi) is in A4. In the remainder of the proof we will omit the subscript, 
i.e. let A = A\ and p = pi. Now, clearly the set 

B = {X : p + e < r max (X) <p + 2eorp-e> r min (X) > p - 2e} 

forms a barrier (for sufficiently large n) between the sets A and A° for any Markov chain that 
updates only o(n) edges per time-step, since each edge update can change each of r max and r m \ n 
by at most O f^). 

It remains only to bound the relative probabilities of the sets A and B. Let C = A C \B, and let 



t = cn 2 such that Lemma 1 1 2 1 holds. Then 

P(X(t) G C\X(0) 6fl) = e~ n(n) (27) 

and 

P(X(t) G B\X(0) G A U B) = e~ n ^ . (28) 
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Let the configuration X(0) be drawn according to the Gibbs measure n = p n defined in Equation 
Q, and let X(t) be the configuration resulting after t steps of the Glauber dynamics. Because 
the Glauber dynamics has stationary distribution ir, X{t) has the same distribution as -^(0). By 
the reversibility of the Glauber dynamics and the estimate (|27l) we have 



(29) 



P(X(t) e B,X(0) eC) = P(X(t) e C,x(0) e B) 

= P(X(t) G C\X(0) G B)P(X(0) G B) 
= e- Q(n) P(X(0) G B) . 



Similarly, using (28), 



P(X(t) G B, X(0) eAuB) = P(X(t) G B\X(0) G A U B)P(X(0) G A U B) 

< e - n{n) P(X(0) eiUB) (30) 
= e- Q (")(P(X(0) G A) + P(X(0) G B)) . 



Combining (29) and bSOti, we have 



ir(B) = P{X(t) G B) 

= P(X(t) G B, X(0) G C) + P(X(t) G 5, X(0) 6 AUB) 
< e- Q( - n \P{X{0) G A) +2P(X(0) G B)) 



(31) 



-O(n) 



which, upon rearranging, gives 



(tt(A) + 2tt(B)) 



n(B) < 



3 -n(n) 



1 _ 2 e - n (") 



7T(A). 



(32) 



Together with Claim 17 this completes the proof. 



3 Asymptotic independence of edges and weak pseudo-randomness 



Our burn in proof in the high temperature regime shows that with high probability all the 
ro(X, e) are close to p* , the fixed point of f>(p) = p. A consequence is that for any collection of 
edges ei,. . . ,ej the events x ei are asymptotically independent and distributed as Bernoulli (p*). 
A consequence of the asymptotic independence of the edges is that with high probability a graph 
sample from the exponential random graph distribution is weakly pseudo-random, as defined in 
|13| . As such, the exponential random graph model is extremely similar to the basic Erdos-Renyi 
random graph. Since exponential random graphs were introduced to model the phenomenon of 
increased numbers of small subgraphs like triangles, this result proves that the model fails in it's 
stated goal. 



Theorem 18 Let X be drawn from the exponential random graph distribution in the high tem- 
perature phase. Let e\, . . . , e& be an arbitrary collection of edges with associated indicator random 
variables x ei = l(ej G X). Then for any e > 0, there is an n such that for all (ai, . . . , cifc) G {0, l} k 
the random variables x ei , . . . , x ek satisfy 

P(xi = ai, . . . , x k = o fc ) - (;p*) Ea '(l - p*) k -^ 

Thus, the random variables x ei , . . . , x ek are asymptotically independent. 
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Proof. Fix e > 0. Let S C [/c] and let x$ = {x ei : i £ 5} and x^c = {x e . : i £ [fe] \ 5}. Then by 
the inclusion-exclusion principle, we have 



P(x s = l,x S c = 0)= £ (-l)l T lp(x SU T = 1) • 

TC[fc]-5 



(33) 



We argue next that each probability in the preceding sum satisfies P(x 5uT = 1) ~ (p*) |5uT| . Let 

A = {X : r max (X) < + e' and r min (X) > p* - e 1 } , 

where e' is to be specified later. Consider the subgraph Gt formed by the edges in the set T. For 
all configurations X £ A, \ra T (X, e) — p*\ < e', which gives 



N GT (X,e)-(p*) lEl - 1 2\E\n\ v \- 2 



< e" 



(34) 



for sufficiently large n. By considering the graph consisting of two disjoint edges, we have that 
the number of edges in a configuration X £ A satisfies 



<e". 



Note that 



N edge (X)-p* 
Y,N G (X,e) = \E\N G (X), 



(35) 



and summing Equation ( 34 ) over the edges in X and using ( 35 ) , this gives 

N Gt (X) - (p*) k n^ 



< 



(36) 



for a sufficiently small choice of e' in the definition of the set A. By symmetry, each of the 
subgraphs Gt is equally likely to be included in the configuration X, and there are n'^' possible 
such subgraphs, so P(x T = 1) = Ng t} X) ■ It follows that |P(x T = 1\X 6 A) - {p*) k \ < 



n]v] . im«,.o ^« r v^j — i = ^ / i - 2 k n\ v \- 
Recall that ~P(A) = 1 + o(l) in the high temperature phase. Thus, for any set of edges T C [fc], 
it holds that 



P(X T = 1) " (p*) lTl = (1 + 0(1)) P(X T = 1|X £ A) - (p*) |T| < (1 + 0(1)) 



|T| 



2 A ' 



Hence 



E (-l) |T| P(^UT 

TC [it] -5 



i)- E (-i) |2 V) |sur| 

TC[fe]-S 



< 



■r? 



for sufficiently large n, and the desired result follows from (33) and the fact that 

k-\S\ 



E (-i) |r| (^) |5UT| = (P*) |5| E ( k ~ lsl 

TC[k]-S 9=0 ^ q 

= (P*)' 5| (l -p*) k ~ lSl . 



We can also show that an exponential random graph is weakly pseudo-random with high proba- 
bility. This means that a collection of equivalent conditions are satisfied; we briefly mention only 
a few of them (see the survey on pseudo-random graphs [13]). We will use a different subgraph 
count than before: for a graph G let N G (X) be the number of labeled induced copies of G in 
X. This is different than the counts N G (X) in that it requires edges missing from G to also be 
missing in the induced graph in X. A graph X is weakly pseudo-random if it satisfies one of the 
following (among others) equivalent properties: 
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1. For a fixed I > 4 for all graphs G on I vertices, 

Nq(X) = (1 + o(l))n\p)^(l - pp-WW . 

2. N e d ges (X) > ^ + o{n 2 ) and Ai = (1 + o(l))np, X2 = o(n), where the eigenvalues of the 
adjacency matrix of X are ordered so that |Ai| > IA2I > • • • > |A n |. 



3. For each subset of vertices U C V(X) the number of edges in the subgraph of X induced 



by U satisfies E(H X {U)) = ^\U\ 2 + o(n 2 ). 



4. Let C; denote the cycle of length I, with / > 4 even. The number of edges in X satisfies 
hedges = ^ + o(n 2 ) and the number of cycles C\ satisfies Nd(X) < (np) 1 + o{n l ). 



By (36), for any configuration in the good set, X £ G, the fourth condition is satisfied. This 



gives the following corollary. 

Corollary 19 (Weak pseudo-randomness) With probability 1 + o(l) an exponential random 
graph is weakly pseudo-random. 
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